The LTER-Hyper-SRB Project: A collection management system for LTER hyperspectral remote sensing data

Issue: 
Network News Fall 2000, Vol. 13 No. 2
Section:
Network News

A collaborative activity between the San Diego Supercomputer Center and the Long Term Ecological Research Network.

Recent developments in remote sensing technologies have provided new methods for gathering and interpreting landscape data. One type of remote sensing data that is of particular interest to LTER scientists is imaging spectroscopy. With imaging spectroscopy, the recorded data contain the full solar reflected spectrum of the imaged landscape. This data can be analyzed to provide information about vegetation and soil properties. Imaging spectroscopy is performed using a hyperspectral sensor. Current hyperspectral measurements of the Sevilleta and Jornada LTER sites are achieved using aircraft carrying the Airborne Visible InfraRed Imaging Spectrometer (AVIRIS). Later this fall NASA plans to launch a satellite carrying a similar instrument called Hyperion, providing the first spaceborne imaging spectrometer.

The collection, analysis, and application of hyperspectral data raises specific challenges in data management. In particular, the data files are large (the files comprising a single image my be as large as 80GB), there are numerous files (each overflight consists of numerous raw and processed data files), the research community is heterogeneous and distributed (the LTER researchers are geographically distributed and functionally diverse – i.e., some are involved in analysis, some are only data consumers), data load and demand are expected to increase significantly (especially with the launch of the Hyperion sensor), and the information infrastructure is dynamic (the hyperspectral data collections need to persist past changes to the current physical infrastructure). Other challenges for data management include organization (including metadata), access control and security, persistent archives and migration, and web-based access. These requirements are best addressed through the design and implementation of a scientific collections management system.

The focus of the LTER-hyper-SRB project is to design and implement a collections management system for the LTER hyperspectral data products. This system is being built using the NPACI resources available at SDSC, specifically, the high performance storage system (HPSS), the metadata catalog (MCAT), and the storage resource broker (SRB).

The SDSC High Performance Storage System (HPSS) is a general purpose parallel storage system that is scalable in several dimensions: data transfer rate, storage size, number of name space objects, size of objects, and geographical distribution. The SDSC HPSS provides the physical storage for the LTER hyperspectral data products. The HPSS is designed to use network-connected, as well as directly connected storage devices to achieve high transfer rates. The SDSC HPSS currently provides 480 terabytes, or ‘Tbytes’ of storage space, of which 170 TBytes is already in use.

The Metadata Catalog (MCAT) is a metadata repository system implemented at SDSC to provide a mechanism for storing and querying system-level and domain-dependent metadata using a uniform interface. The MCAT provides a resource and data-set discovery mechanism that can be used effectively to identify and discover resources and data sets of interest using a combination of their characteristic attributes instead of their physical names and/or locations. The MCAT also stores information about the locations where the replicas of the data sets are stored, access control lists for each data set, and audit trails of usage. Queries to the metadata catalog are resolved into (location, protocol) pairs for retrieval or manipulation of the data.

The SDSC Storage Resource Broker (SRB) is client-server based middle-ware implemented at SDSC to provide a uniform data handling interface to different types of storage devices. The SRB provides a uniform API (Application Programming Interface—a sort of a standardized library toolbox that other programs use, in this case to link to the SRB programs) to connect to heterogeneous resources that may be distributed and to access data sets that may be replicated. The SRB can be used to access data sets distributed across file systems, databases, and archives. The container is a key concept in SRB. Containers provide the ability to aggregate small files into a single physical file before storage in an archive. When a dataset is accessed, the SRB retrieves the appropriate container onto a disk cache, and then supports read/write commands on the data set that was stored in the container.

The SRB, in conjunction with the MCAT, provides a method for assembling distributed data sets into a collection. Since the data sets are controlled by the collection, it is possible to keep persistent identifiers associated with each data set, regardless of where they are moved.

Together, the HPSS, MCAT, and the SRB provide the foundation for constructing a scalable, robust, collections management system for the LTER hyperspectral data.

Plan The plan is to develop a prototype system at SDSC, utilizing the SDSC HPSS, MCAT, and SRB. The MCAT will reside in the SDSC Oracle database. Scripts will be developed/adapted to move data between Greg Asner’s lab at the University of CO, Boulder, and SDSC. The initial system will implement a minimal set of functions (basic access control, security). User access will be provided through the SRB browser interface or the Unix command-line interface. Additional features will be added as needed, including additional metadata, web-access GUIs, and data cutter proxies for manipulating datasets. The option to migrate or replicate the collections management system will be considered in the future.

We envision that the system will serve as an accessible archive for LTER researchers, permitting data access similar to but more advanced than what LTER remote sensing researchers have experienced though the LTER Network Office archive for Landsat Thematic Mapper data (http://www.lternet.edu/technology/satellite/).

Current Status According to George Kremenek, we entered beta production mode on 6/24/00. Three compressed data files have been transferred to the SDSC SRB (HPSS), each 500MB. These have been uncompressed at SDSC and the smaller files have been stored into a SRB container. These files have been accessed through the SRB client at Greg Asner’s Lab. Next steps will involve generating applications in the LTER Network, providing access to a broader community of users (consider web interface), and populating the system with additional data files.

PEOPLE

LTER

SDSC

Greg Asner
Niwot Ridge LTER
asner@terra.colorado.edu
Dave Archbell
dave@sdsc.edu
Joyce Francis
Sevilleta LTER
jfrancis@sevilleta.unm.edu
Tony Fountain
fountain@sdsc.edu
Kathy Heidebrecht
University of Colorado
kathy@cses.colorado.edu
George Kremenek
kremenek@sdsc.edu
Barbara Nolen
Jornada Basin LTER
bnolen@nmsu.edu
Reagan Moore
moore@sdsc.edu
John Vande Castle
LTER Network Office
Joshua Polterock
joshuap@sdsc.edu
  Arcot Rajasekar
sekar@sdsc.edu

For more information, please visit the web site http://www.npaci.edu/DICE/SRB/index.html